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(54) Issuing dependent instructions in a data 
secondary cache hit 



processing system based on speculative 



(57) A method for optimally issuing instructions that 
are related to a first instruction in a data processing sys- 
tem is disclosed. The processing system includes a pri- 
mary and secondary cache. The method and system 
comprises speculatively indicating a hit of the first in- 
struction in a secondary cache and releasing the de- 
pendent instructions. The method and system includes 
determining if the first instruction is within the secondary 
cache. The method and system further includes provid- 
ing data related to the first instruction from the second- 
ary cache to the primary cache when the instruction is 
within the secondary cache. A method and system in 
accordance with a preferred embodiment of the present 



invention causes instructions that create dependencies 
(such as a load instruction) to signal an issue queue 
(which is responsible for issuing instructions with re- 
solved conflicts) in advance, that the instruction will 
complete in a predetermined number of cycles. In an 
embodiment, a core interface unit (CIU) will signal an 
execution unit such as the Load Store Unit (LSU) that it 
is assumed that the instruction will hit in the L2 cache. 
An issue queue uses the signal to issue dependent in- 
structions at an optimal time. If the instruction misses in 
the L2 cache, the cache hierarchy causes the instruc- 
tions to be abandoned and re-executed when the data 
is available. 
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Description 

[0001] The present invention relates generally to a su- 
per scalar processor and more particularly to optimally 
issuing dependent instructions in such a system. 
[0002] Super scalar processors employ aggressive 
techniques to exploit instruction-level parallelism. Wide 
dispatch and issue paths place an upper bound on peak 
instruction throughput. Large issue buffers are used to 
maintain a window of instructions necessary for detect- 
ing parallelism, and a large pool of physical registers 
provides destinations for all of the in-flight instructions 
issued from the window beyond the dispatch boundary. 
To enable concurrent execution of instructions, the ex- 
ecution engine is composed~of many paralleTf unctionaF 
units. The fetch engine speculates past multiple branch- 
es in order to supply a continuous instruction stream to 
the decode, dispatch and execution pipelines in order 
to maintain a large window of potentially executable in- 
structions. 

[0003] The trend in super scalar design is to scale 
these techniques: wider dispatch/issue, larger windows, 
more physical registers, more functional units, and 
deeper speculation. To maintain this trend, it is important 
to balance all parts of the processor-any bottlenecks 
which diminish the benefit of aggressive techniques. 
[0004] instruction fetch performance depends on a 
number of factors. Instruction cache hit rate and branch 
prediction accuracy has been long recognized as impor- 
tant problems in fetch performance and is well-re- 
searched areas. 

[0005] Modern microprocessors routinely use a plu- 
rality of mechanisms to improve their ability to efficiently 
fetch past branch instructions. These prediction mech- 
anisms allow a processor to fetch beyond a branch in- 
struction before the outcome of the branch is known. For 
example, some mechanisms allow a processor to spec- 
ulatively fetch beyond a branch before the branch's tar- 
get address has been computed. These techniques use 
run-time history to speculatively predict which instruc- 
tions shou Id be fetched and eliminate "dead" cycles that 
might normally be wasted waiting for the actual deter- 
mination of the next instruction address. Even with these 
techniques, current microprocessors are limited in 
fetching instructions during a clock cycle. As super sca- 
lar processors become more aggressive and attempt to 
execute many more instructions per cycle, they must al- 
so be able to fetch many more instructions per cycle. 
[0006] High performance super scalar processor or- 
ganizations divide naturally into an instruction fetch 
mechanism and an instruction execution mechanism. 
The fetch and execution mechanisms are separated by 
instruction issue buffer(s), for example, queues, reser- 
vation stations, tc. Conceptually, the instruction fetch 
mechanism acts as a "producer" which f tches, de- 
codes, and places instructions into a reorder buff r. The 
instruction execution engine "prepares" instructions for 
completions. The completion engine is the "consumer" 
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which removes instructions from the buffer and exe- 
cutes them, subject to data dependence and resource 
constraints. Control dependencies (branches and 
jumps) provide a feedback mechanism between the pro- 

5 ducer and consumer. 

[0007] Dispatching and completion of instructions are 
typically in program order. However, issuance and exe- 
cution are not necessarily in program order An instruc- 
tion is dispatched to an issue queue for a particular ex- 

10 ecution unit, or at least a particular type of execution unit 
(aka functional unit). A load/store unit is a type of func- 
tional unit for executing memory accesses. An issue 
queue issues an instruction to its functional unit respon- 
sive to the instruction's operands being available for ex- 
T5 ecXitionTi.e., when result^re~available"from"any"eariien 
dispatched instructions upon which the instruction is de- 
pendent. 

[0008] The present invention accordingly provides, in 
a first aspect, a method for optimally issuing instructions 

20 that are dependent on a first instruction in a data 
processing system, the processing system including a 
primary and secondary cache, the method comprising 
the steps of: (a) speculatively indicating a hit of the first 
instruction in a secondary cache and releasing the de- 

25 pendent instructions; (b) determining if the first instruc- 
tion is within the secondary cache; and (c) providing da- 
ta related to the first instruction and the dependent in- 
structions from the secondary cache to the primary 
cache when the first instruction is within the secondary 

30 cache. 

[0009] The first instruction preferably comprises a 
load instruction. The primary cache preferably compris- 
es a data L1 cache. The secondary cache preferably 
comprises an L2 cache. It is further preferred to include 
35 the step of: (d) cancelling the load instruction and its de- 
pendent instructions when the first instruction is not 
within the L2 cache. 

[0010] In a second aspect, the present invention pro- 
vides a processor for optimally issuing instructions that 
40 are dependent on a first instruction comprising: an exe- 
cution unit for issuing instructions; a primary cache cou- 
pled to the execution unit; a secondary cache; and a 
core interface unit coupled to the primary cache, the 
secondary cache and the execution unit, the core inter- 
ns face unit for providing a signal to the execution unit when 
a first instruction is not a hit in the primary cache, the 
signal causing the execution unit to guess that a hit of 
the first instruction has occurred in the secondary cache 
and speculatively release instructions that are depend- 
50 ent upon the first instruction. 

[0011] In the processor of the second aspect, the first 
preferred features comprise components correspond- 
ing to the preferred features of the method of the first 
aspect. A secondary pref rred feature is that th xe- 
55 cution unit comprises a load store unit. 

[0012] A processor of the second aspect preferably is 
incorporated in a system for optimally issuing instruc- 
tions that are d pendent on a first instruction in a data 
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processing system, the processing system including a 
primary and secondary cache, the system comprising: 
means for speculatively indicating a hit of the first in- 
struction in a secondary cache and releasing the de- 
pendent instructions; means for determining if the first 5 
instruction is within the secondary cache; and means for 
providing data related to the first instruction and the de- 
pendent instructions from the secondary cache to the 
primary cache when the first instruction is within the sec- 
ondary cache. Further preferred features comprise 10 
means for performing the steps of the method of the first 
aspect. 

[0013] In a high-speed highly speculative processor, 
groups of instructions are issued based on interdepend- 
ences. Some operations such as Load instructions can ?5 
have variable and unpredictable latency which makes 
interdependency analysis difficult. A solution is needed 
that improves the performance of instruction groups de- 
pendent on Load operands. More particularly, what is 
needed is a system and method for efficiently issuing 20 
dependent instructions in such a processor. The present 
invention addresses such a need. 

[001 4] A method for optimally issu in g instructions that 
are related to a first instruction in a data processing sys- 
tem is disclosed. The processing system includes a pri- 25 
mary and secondary cache. The method and system 
comprises speculatively indicating a hit of the first in- 
struction in a secondary cache and releasing the de- 
pendent instructions. The method and system includes 
determining if the first instruction is within the secondary 30 
cache. The method and system further includes provid- 
ing data related to the first instruction from the second- 
ary cache to the primary cache when the instruction is 
within the secondary cache. 

[0015] A method and system in accordance with the 35 
present invention causes instructions that create de- 
pendencies (such as a load instruction) to signal an is- 
sue queue (which is responsible for issuing instructions 
with resolved conflicts) in advance, that the instruction 
will complete in a predetermined number of cycles. In 
an embodiment, a core interface unit (CIU) will signal 
an execution unit such as the Load Store Unit (LSU) that 
it is assumed that the instruction will hit in the L2 cache. 
An issue queue uses the signal to issue dependent in- 
structions at an optimal time. If the instruction misses in *s 
the 12 cache, the cache hierarchy causes the instruc- 
tions to be abandoned and re-executed when the data 
is available. 

[0016] A preferred embodiment of the present inven- 
tion will now be described, by way of example only, with so 
reference to the accompanying drawings, in which: 

Figure 1 is a block diagram of a conventional proc- 
essor; 

55 

Figure 2 is a flow chart illustrating a conventional 
m thod for issuing dependent instructions in the 
processor of Figure 1; 
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Figure 3 is a block diagram of a processor in ac- 
cordance with a preferred embodiment the present 
invention; and 

Figure 4 is a flow chart illustrating a method for is- 
suing dependent instructions in a data processing 
system in accordance with a preferred embodiment 
of the present invention. 

[0017] The present invention relates generally to a su- 
per scalar processor and more particularly to a system 
and method for improving the overall throughput in such 
a processor. 

[0018] Figure 1 illustrates a processor 100. Processor 
100 includes issue unit (ISU) 125 which will be de- 
scribed in detail below with reference to Figure 2. ISU 
125 gives execution units 130, 140, and 150 the ability 
to reject instructions. Rejected instructions remain in 
ISU 125 to be reissued at a later time. 
[0019] In the illustrative embodiment shown in Figure 
1, processor 100 comprises a single integrated circuit 
super scalar microprocessor. Accordingly, processor 
100 includes various execution units, registers, buffers, 
memory devices, and other functional units, which are 
all formed by integrated circuitry. Of course^although 
the invention is described herein as applied to a micro- 
processor, the present instruction-handling scheme is 
not limited to microprocessors and may be implemented 
in other types of processors. 

[0020] As illustrated in Figure 1 , processor 1 00 is cou- 
pled to system bus 113 via a core interface unit (CIU) 
114 and processor bus 115. Both system bus 113 and 
processor bus 115 include address, data, and control 
buses which are not shown separately. CIU 114 partic- 
ipates in bus arbitration to control the transfer of infor- 
mation between processor 100 and other devices cou- 
pled to system bus 113, such as L2 cache 116 and main 
storage 117. The data processing system illustrated in 
Figure 1 preferably includes other devices coupled to 
system bus 113; however, these other devices are not 
necessary for an understanding of the invention and are 
accordingly omitted from the drawings. 
[0021] CIU 114 is connected to instruction cache 118 
and data L1 cache 119. High-speed caches, such as 
those within instruction LI cache 1 1 8 and data LI cache 
119, enable processor 100 to achieve relatively fast ac- 
cess times to a subset of data or instructions previously 
transferred from main memory 117 to the L2 cache 116 
and then to the respective L1 cache 118 or 119, thus 
improving the overall processing speed. Data and in- 
structions stored within the data cache 119 and instruc- 
tion cache 118, respectively are each identified and ac- 
cessed by an effective address, which is related to the 
real address of the respect iv data or instructions in 
main memory 117. 

[0022] Instruction L1 cache 118 is further coupled to 
sequential fetcher 1 20, which f tches instructions for x- 
ecution from instruction L1 cache 118 during each proc- 
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essor cycle. Sequential fetcher 1 20 transmits branch in- 
structions fetched from instruction L1 cache 118 to 
branch processing unit (BPU) 121 for execution, and 
temporarily stores sequential instructions within instruc- 
tion queue 1 22 for eventual transfer to dispatch unit 1 24 
_ for decoding and dispatch to the instruction issue unit. 
(ISU) 125. 

[0023] In the depicted illustrative embodiment, in ad- 
dition to BPU 121, the execution circuitry of processor 
1 00 comprises multiple execution units for executing se- 
quential instructions, including fixed-point unit (FXU) 
130, load-store unit (LSU) 140, and floating-point unit 
(FPU) 150. Each execution unit 130, 140, and 150typ- 
ically executes one or more instructions of a particular 
type during each processor cycle. 
[0024] FXU 130 performs fixed-point mathematical 
and logical operations such as addition, subtraction, 
ANDing : Oring, and XORing, utilizing source operands 
received from specified general-purpose registers 
(GPRs) 1 32. Following the execution of a fixed-point in- 
struction, FXU 130 outputs the data results of the in- 
struction on result bus 128 to a GPR register file 133 
associated with GPRs 132. 

[0025] FPU 1 50 typically performs single and double- 
precision floating-point mathematical and logical oper- 
ations, such as floating-point multiplication and division, 
on source operands received from floating-point regis- 
ters (FPRs) 152. FPU 150 outputs data resulting from 
the execution of floating-point instructions on result bus 
128 to a FPR register file 153, which temporarily stores 
the result data. 

[0026] LSU 140 typically executes floating-point and 
fixed-point instructions which either load data from 
memory or which store data to memory. For example, 
an LSU instruction may load data from either. the data 
L1 cache 119 or an L2 cache 116 into selected GPRs 
132 and FPRs 152. Other LSU instructions may store 
data from a selected GPR 132 or FPR 152 to the data 
L1 cache 119 and then to the L2 cache 116. The L2 
cache includes an L2 cache directory 155 which holds 
the tags for the data which is within the L2 cache. 
[0027] Processor 1 00 employs both pipeline and out- 
of-order execution of instructions to further improve the 
performance of its super scalar architecture. Instruc- 
tions can be executed by FXU 1 30, LSU 140, and FPU 
1 50 in any order as long as data dependencies are ob- 
served. Within individual execution units, 130, 140, and 
150, instructions are also processed in a sequence of 
pipeline stages unique to the particular execution unit. 
[0028] During the fetch stage, sequential fetcher 120 
retrieves one or more instructions associated with one 
or more memory addresses from instruction L1 cache 
118. Sequential fetcher 120 stores sequential instruc- 
tions fetched from instruction L1 each 118 within in- 
struction queue 122. Branch instructions are removed 
or folded out by s quential fetch r 120 to BPU 121 for 
execution. BPU 121 includes a branch prediction mech- 
anism (not shown separately) which, in one embodi- 



ment, comprises a dynamic prediction mechanism such 
as a branch history table. This branch history table en- 
ables BPU 121 to speculatively execute unresolved 
conditional branch instructions by predicting whether or 
5 not the branch will be taken. 

[0029] - During the decode/dispatch stage, dispatch 
unit 124 decodes and dispatches one or more instruc- 
tions from instruction queue 122 to ISU 125. ISU 125 
includes a plurality of issue queues 134, 144, and 154, 

io one issue queue for each execution unit 1 30, 140 : and 
150. ISU 125 also includes circuitry for receiving infor- 
mation from each execution unit 130, 140, and 150 and 
for controlling the issue queues 1 34, 144, and 1 54. Ac- 
cordin g to a preferred embodiment of the invention, in- 

15 structions for each respective execution unit 130, 140, 
and 150 are stored in the respective issue queue 134, 
144, and 154 ; and then issued to the respective execu- 
tion unit to be processed. However, instructions are 
dropped or removed from the issue queues 1 34, 1 44, or 

20 1 54 only after the issued instruction is fully executed by 
the respective execution unit 130, 140, or 150. 
[0030] During the execution stage, execution units 
1 30, 1 40, and 1 50 execute instructions issued from their 
respective issue queues 134, 144, and 154. As will be 

25 described below, each execution unit according to a pre- 
ferred embodiment of the invention may reject any is- 
sued instruction without fully executing the instruction. 
However, once the issued instructions are executed and 
that execution has terminated, execution units 130, 140, 

30 and 1 50 store the results, if any, within either GPRs 1 32 
or FPRs 152, depending upon the instruction type. Ex- 
ecution units 130, 140, and 150 also notify completion 
unit 160 that the instructions have finished execution. 
Finally, instructions are completed in program order out 

35 of a completion buffer (not shown separately) associat- 
ed with the completion unit 160. Instructions executed 
by FXU 1 30 are completed by releasing the old physical 
register associated with the destination GPR of the com- 
pleted instructions in a GPR rename table (not shown). 

40 Instructions executed by FPU 150 are completed by re- 
leasing the old physical register associated with the des- 
tination FPR of the completed instructions in a FPR re- 
name table (not shown). Load instructions executed by 
LSU 140 are completed by releasing the old physical 

45 register associated with the destination GPR or FPR of 
the completed instructions in the GPR or FPR rename 
table (not shown). Store instructions executed by LSU 
140 are completed by marking the finished store instruc- 
tions as completed in a store queue (not shown). Com- 

50 pleted store instructions in the store queue will eventu- 
ally be written to memory. 

[0031] The preferred embodiment of the present in- 
vention will be described below with reference specifi- 
cally to one execution unit, LSU 140, along with ISU 125 
55 and issue queue 144. The present invention is not lim- 
ited to the particular LSU operation described below. 
Other LSU pipeline stages as well as the pipeline stages 
performed by other execution units are to be consid red 
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equivalents to the illustrated examples. 

[0032] The following illustrates the cycles of a typical 

LSU 140 pipeline: 

Stage 0: RFL RegisterFile access cycle-read out 
GPR values for Load Instruction operands 
or receive bypass data from L1 cache for 
operand 

Stage 1 : AGN Address Generation cycle - add oper- 
ands together to create Load data address 

Stage 2: ACC Access cycle - L1 cache is addressed 

Stage 3: RES Results cycle - L1 cache data is avail- 
able 

Stage 4: FIN Finish Cycle - LSU Load Completion 
Signalled 

[0033] Figure 2 illustrates a conventional method for 
issuing dependent instructions in a data processing sys- 
tem for such a pipeline. Referring now to Figures 1 and 
2 together, first an instruction such as a load instruction 
enters the LSU pipeline, via step 202. Next, it is deter- 
mined whether the instruction is a hit in the data L1 
cache 119, via step 204. If the instruction is a hit then it 
is finished, via step 206. However, if the instruction is 
not in the data L1 cache, then the L2 tag is accessed in 
the L2 cache directory 155 of L2 cache 116, via step 
208. Next, it is determined if there is a hit in the L2 cache 
116, via step 210. If there is a hit in the L2 cache, then 
the data is accessed in the L2 cache, via step 212. The 
data is then placed in the L1 reload bus 115 via the L2 
reload bus 160 from the L2 cache 116, via step 214. 
Thereafter, the LSU pipeline is re-entered and the de- 
pendent instructions are released by the LSU 140 ; via 
step 216. Thereafter, the L1 reload data is forwarded to 
the LSU 140, via step 219. Finally, the instructions are 
finished ; via step 206. Typically these instructions are 
finished on a cache line basis. If there is not a hit in the 
L2 cache, then the next higher level of the cache hier- 
archy is accessed via step 220 and the L2 reload data 
is forwarded, via step 222. Then steps 212-218 are en- 
abled. 

[0034] A problem with the above-identified conven- 
tional system is that by waiting to determine if the data 
is in the L2 cache the release of dependent instructions 
impacts the overall performance of the processor It has 
been determined that additional cycles are required 
when waiting for the determination of the L2 cache. 
[0035] A method and system in accordance with a 
preferred embodiment of the present invention causes 
instructions that create dependenci s (such as a load 
instruction) to signal an issue queue (which is respon- 
sible for issuing instructions with resolved conflicts) in 
advance, that the instruction will complete in a prede- 
termined number of cycles. In a preferred mbodim nt, 



referring to Figure 3, the CIU 1 1 4 will signal the LSU 1 40 
via signal 1 61 that it is assumed that the instruction will 
hit in the L2 cache 1 1 6. The issue queue 1 44 of the ISU 
125 uses the signal to issue dependent instructions at 

5 an optimal time. If the instruction misses in the L2 cache 
116, the cache hierarchy causes the instructions to be 
abandoned and re-executed when the data is available. 
[0036] To describe the operation of a preferred em- 
bodiment of the present invention in more the detail, re- 

io fer now to the following discussion in conjunction with 
the accompanying figures. Figure 3 is a block diagram 
of a processor in accordance with a preferred embodi- 
ment of the present invention. Figure 3 is similar to Fig- 
ure 1 except for a signal 161 from the CIU 114 which at 

7 5 the appropriate time causes the LSU 1 40 to release in- 
structions dependent upon the load instruction. Accord- 
ingly, as is seen elements in Figure 3 which are similar 
to the elements in Figure 1 have the same reference 
numbers. Figure 4 is a flow chart illustrating a method 

20 for issuing dependent instructions in a data processing 
system in accordance with a preferred embodiment of 
the present invention. 

[0037] Referring now to Figures 3 and 4 together, first 
the instruction enters the pipeline, via step 302: Next it 

25 js determined whether the instruction is a hit in the data 
cache, via step 304. If the instruction is a hit then it is 
finished, via step 306. However if the instruction is not 
in the L1 cache, a guess signal 161 from the CIU 114 
will be provided to the LSU which releases the depend- 

30 ent instructions from the LSU 140, via step 307. This 
guess signal 161 is, in effect, speculatively guessing 
that the instruction is a hit in the L2 cache and therefore 
causes the release of its dependent instructions. Next, 
the L2 tag is accessed via the L2 cache directory 155, 

3S via step 308. Then, it is determined if there is a hit in the 
L2 cache, via step 310. If there is a hit in the L2 cache, 
then the data is accessed in the L2 cache, via step 312. 
The data is then placed on the L1 reload bus via step 
314. Thereafter, the LSU 140 pipeline is re-entered, via 

40 step 316. The L1 reload data is then forwarded to the 
LSU 140, via step 318. Finally, the instructions are fin- 
ished, via step 306. 

[0038] If the data is not in the L2 cache, then guess 
L2 hit is wrong, via step 330. and the dependent instruc- ~ 

45 tions are cancelled. Thereafter, the next level of the 
cache hierarchy is accessed, via step 320. The depend- 
ent instructions are then released, via step 321. There- 
after the L2 reload data is forwarded, via step 322. Then 
steps 314-318 are repeated. 

so [0039] Accordingly, by speculatively releasing the de- 
pendent instructions, via the guess signal prior to know- 
ing if the instruction is in the L2 cache, the performance 
of the processor is significantly improved. A speculative 
guess of a hit in the L2 cache is reliable becaus th L2 

55 cache is typically very large and has a high probability 
of hit. On an L2 miss the instruction re- nt rs th LSU 
pipeline and fails to return data. The LSU then releases 
any held dependent instructions and they are then can- 
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celled. This uses pipeline slots but the cost is very small 
versus the gain accomplished when there is a hit in the 
L2 cache. 

[0040] A method for optimally issuing instructions that 
are related to a first instruction in a data processing sys- 5 
tern is disclosed. The processing system includes a pri- 
mary and secondary cache. The method and system 
comprises speculatively indicating a hit of the first in- 
struction in a secondary cache and releasing the de- 
pendent instructions. The method and system includes 
determining if the first instruction is within the secondary 
cache. The method and system further includes provid- 
ing data related to the first instruction from the second- 
~ary"cach"e~tcrthe~primary cacrTe^heTnrTe"instruction is 
within the secondary cache. 



Claims 

1 . A method for optimally issuing instructions that are 
dependent on a first instruction in a data processing 
system, the processing system including a primary 
and secondary cache : the method comprising the 
steps of: 

(a) speculatively indicating a hit of the first in- 
struction in a secondary cache and releasing 
the dependent instructions; 

(b) determining if the first instruction is within 
the secondary cache; and 

(c) providing data related to the first instruction 
and the dependent instructions from the sec- 
ondary cache to the primary cache when the 
first instruction is within the secondary cache. 

2. The method of claim 1 wherein the first instruction 
comprises a load instruction. 

3. The method of claim 2 wherein the primary cache 
comprises a data L1 cache. 



10 



15 



20 



a secondary cache; and 

a core interface unit coupled to the primary 
cache, the secondary cache and the execution 
unit, thejcote interface^unit for providing a sig- - 
nal to the execution unit when a first instruction 
is not a hit in the primary cache, the signal caus- 
ing the execution unit to guess that a hit of the 
first instruction has occurred in the secondary 
cache and speculatively release instructions 
that are dependent upon the first instruction. 

The processor of claim 6 wherein the first instru cti on 
comprises a load instruction. 

The processor of claim 7 wherein the primary cache 
comprises a data LI cache. 

9. The processor of claim 8 wherein the secondary 
cache comprises an L2 cache. 

10. The processor of claim 9 wherein the execution unit 
comprises a load store unit. 



7. 



8. 
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40 



The method of claim 3 wherein the secondary cache 
comprises an L2 cache. 



45 



5. The method of claim 4 which includes the step of: 

(d) cancelling the load instruction and its de- 
pendent instructions when the first instruction is not 
within the L2 cache. so 

6. A processor for optimally issuing instructions that 
are dependent on a first instruction comprising: 



an execution unit for issuing instructions; 

a primary cache coupl d to the execution unit; 



55 
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(57) A method for optimally issuing instructions that 
are related to a first instruction in a data processing sys- 
tem is disclosed. The processing system includes a pri- 
mary and secondary cache. The method and system 
comprises speculatively indicating a hit of the first in- 
struction in a secondary cache and releasing the de- 
pendent instructions. The method and system includes 
determining if the first instruction is within the secondary 
cache. The method and system further includes provid- 
ing data related to the first instruction from the second- 
ary cache to the primary cache when the instruction is 
within the secondary cache. A method and system in 
accordance with a preferred embodiment of the present 
invention causes instructions that create dependencies 
(such as a load instruction) to signal an issue queue 
(which is responsible for issuing instructions with re- 
solved conflicts) in advance, that the instruction will 
complete in a predetermin d number of cycles. In an 
embodiment, a core interface unit (CIU) will signal an 
ex cution unit such as the Load Store Unit (LSU) that it 
is assumed that the instruction will hit in the L2 cache. 
An issue queue uses the signal to issue dependent in- 
structions at an optimal time. If the instruction misses in 
the L2 cache, the cache hierarchy causes the instruc- 
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is available. 
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